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Abstract 

Matrix rank minimizing subject to affine constraints arises in many application areas, ranging from signal 
processing to machine learning. Nuclear norm is a convex relaxation for this problem which can recover the rank 
exactly under some restricted and theoretically interesting conditions. However, for many real-world applications, 
nuclear norm approximation to the rank function can only produce a result far from the optimum. To seek a solution 
of higher accuracy than the nuclear norm, in this paper, we propose a rank approximation based on Logarithm- 
Determinant. We consider using this rank approximation for subspace clustering application. Our framework can 
model different kinds of errors and noise. Effective optimization strategy is developed with theoretical guarantee 
to converge to a stationary point. The proposed method gives promising results on face clustering and motion 
segmentation tasks compared to the state-of-the-art subspace clustering algorithms. 

Index Terms 

Subspace clustering. Matrix rank minimization. Nuclear norm, Nonconvex optimization. 


I. Introduction 


R ecently there has been a surge of interest in finding minimum rank matrix within an affine 
eonstraint set m, m. The problem is as follows, 


min rank{Z) s.t A{Z) = b, (1) 

z 

where Z G is the unknown matrix, A : —)■ TZ^ is a linear mapping, and b E TZ^ denotes 

the observations. Unfortunately, however, minimizing the rank of a matrix is known to be NP-hard and a 
very ehallenging problem. 

Consequently, a widely-used convex relaxation approach is to replace the rank function with the nuclear 
norm ||Z||* = X^"=icrj(Z), where crj(Z) is the i-th singular value of Z (suppose n < m). The nuclear 
norm technique has been shown to be effective in encouraging a low-rank solution [|3l, flUl. Nevertheless, 
there is no guarantee for the minimum nuclear norm solution to coincide with that of minimal rank in 
many interesting circumstances, which is heavily dependent on the singular values of matrices in the 
nullspace of A. Some variations of the nuclear norm have demonstrated promising results, e.g., singular 
value thresholding [j5]|, and truncated nuclear norm [0. 

Another popular alternative approach is to compute 


min h{ai{Z)) s.t. A{Z) = b, (2) 

i=l 

where h is usually a nonconvex and nonsmooth function. It has been observed that nonconvex approach 
can succeed in a broader range of scenarios [jTl. However, nonconvex optimization is often challenging. 

To overcome the above-mentioned difficulties, in this paper, we propose to use a particular log- 
determinant (LogDet) function to approximate the rank function. The formulation we consider is: 


F(Z) = logdet{I + Z'^ Z) = ^ log{l + a‘l{Z)), (3) 

i=l 
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where I G 7 ^"^" is an identity matrix. For large nonzero singular values, the LogDet funetion value will be 
mueh smaller than the nuclear norm. It is easy to show that logdet{I + Z'^Z) < \\Z\\^. Therefore, LogDet 
is a tighter rank approximation function than the nuclear norm. Although a similar function logdet{Z + 51) 
has been proposed and iterative linearization has been used to find a local minimum [[ 8 l, its 5 is required to 
be small (e.g., 10 “®), which leads to significantly biased approximation for small singular values and thus 
limited applications. Smoothed Schatten-p function, Tr(Z'^Z + 7 /)^/^, has been well studied in matrix 
completion [[911; nonetheless, the resulting algorithm is rather sensitive to parameter 7. 

The main contributions of this work are as follows: 1) An efficient algorithm is devised to optimize 
LogDet associated nonconvex objective function; 2) Our method pushes the accuracy of subspace clus¬ 
tering to a new level. 


11. Problem statement of subspace clustering 

An important application of the proposed LogDet function is the low-rank representation based subspace 
clustering problem. There has been significant research effort on this subject over the past several years due 
to its promising applications in computer vision and machine learning [fTOl . Subspace clustering aims at 
finding a low-dimensional subspace for each group of points, which is based on the widely-used assumption 
that high-dimensional data actually reside in a union of multiple low-dimensional subspaces. Under such 
an assumption the data could be separated in a projected subspace. Consequently, subspace clustering 
mainly involves two tasks, firstly projecting the data into a latent subspace to describe the affinities of 
points, and subsequently, grouping the data in that subspace. Some spectral clustering methods such as 
normalized cuts (NCuts) [fTH are usually used in the second task to find the cluster membership. Besides 
this spectral clustering-based subspace clustering method, iterative, algebraic, and statistical methods are 
also available in the literature [fTOl . but they are usually sensitive to initialization, noise or outliers. 

Typical spectral clustering-based subspace clustering methods are Local Subspace Affinity (LSA) lIT^ . 
Sparse Subspace Clustering (SSC) [IT3l . Low Rank Representation (LRR) [l2l, ifTdIl and its more robust 
variant LRSC IfTSl . [|T^ . Among them, SSC and LRR give promising results even in the presence of 
large outliers or corruption [fTTII . IITSlI . They both suppose that each data point can be written as a linear 
combination of other points in the dataset. SSC tries to find the sparsest representation of data points 
through /i-norm. Even when the subspaces overlap, SSC can successfully reveal subspace structure [[T9ll . 
SSC’s solution is sometimes too sparse to form a fully connected affinity graph for data in a single 
subspace [[20l . LRR uses the lowest-rank representation to depict the similarity among data points. It is 
theoretically guaranteed to succeed when the subspaces are independent. 

Let X = [xi^X 2 i ■ ■ ■ ,Xn] G store a set of n m-dimensional samples drawn from a union of k 

subspaces. LRR considers the following regularized nuclear norm rank minimization problem: 

mm\\Z\U + \\\E\\i s.t. X = XZ + E, (4) 

where A > 0 is a parameter, E represents unknown corruption, and || ■ ||i can be / 2 , 1 -norm, /i-norm, or 
squared Frobenius norm. Specifically, if random corruption is assumed in the data, ||i? || 1 := YlT=i 

is usually adopted; ||L'|| 2 ,i := \JY1T=i niore suitable to characterize sample-specific corrup¬ 
tions and outliers; \\E\\p := YlT=i 'ZTj=i often describes Gaussian noise. LRR is able to produce pretty 
competitive performance on subspace clustering in the current literature. However, the solution to it might 
not be unique due to the nuclear norm OTTl ; and furthermore, the rank surrogate can deviate far from the 
true rank function. 

To better approximate the rank while possessing the desired robustness similar to LRR, in this paper, 
we propose to use the above-mentioned LogDet function and solve the following problem: 

mmlogdet{I + Z'^ Z) + X\\E\\i s.t. X = XZ + E. (5) 

Z,E 

The objective function of Q is nonconvex. We design an effective optimization strategy based on an 
augmented Lagrangian multiplier (ALM) method, which is potentially applicable to large-scale data 
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because of the decomposability of ALM and its admittance to parallel algorithms. For our optimization 
method, we provide theoretical analysis for its convergence, which mathematically guarantees that our 
algorithm can produce a convergent subsequence and the converged point is a stationary point of Q. 

III. Proposed method: CLAR 

In this section, we present the proposed robust subspace clustering algorithm CLAR: Clustering with 
Log-determinant Approximation to Rank. The basic theorems and optimization algorithm will be presented 
below. 

A. Smoothed rank minimization 

To make the objective function in Q separable, we first convert it to the following equivalent problem 
by introducing an auxiliary variable J: 

min logdet{I + J^J) + A| |ii^| |; s.t. X = XZ + E, Z = J. (6) 

E^J,Z 

We can solve problem Q using a type of ALM method. The corresponding augmented Lagrangian 
function is 

L{E, J, Yi, Y 2 , Z, /i) = logdet{I + E J) + A||E||i 

+ Tr{Y^{J-Z))+ Tr{Y^{X-XZ- E))+ (7) 

|(||J-Z||^ + ||X-XZ-E|||), 

where Yi and Y 2 are Lagrange multipliers, and /i > 0 is a penalty parameter. Then we can apply the 
alternating minimization idea to update one of the variables with the others fixed. 

Given the current point E^, E, Z^, Y^, Y 2 , the updating scheme is: 

= argminTr[(F/)^(/ - Z)] + ^\\E - Z||| 
z z 

+ Tr[{Yi)^{X -XZ- E^)] + ^\\X-XZ- E%, 

= aig min logdet{I + J^J) + 

J 

..t yt 

f^Wj - {z‘» - 

E^+^ = argmin A||E||i + Tr[{Y^f {X - - E)] 

E 

+ ^\\X -XZ^+^ - E\\l. 

The first equation above has a closed-form solution: 

= (J + X^X)-^[X^{X - E*) + E + (8) 

d 

For J updating, it can be converted to scalar minimization problems due to the following theorem 
which is also proved in the supplementary material. 

Theorem 1. If E[Z) is a unitarily invariant function and SVD of A is A = U'ZaV'^, then the optimal 
solution to the following problem 

mmF(Z) + '^\\Z-A\\l (9) 
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Algorithm 1 Smoothed Rank Minimization 

Input: data matrix X £ parameters A > 0, > 0, 7 > 1. 

Initialize: J = I £ E = 0, Yi = Y 2 = 0. 

REPEAT 

11 Obtain Z through j^. 

2l Update J as |l2|. 

3l Solve E by either |l3). |l4|, or jisj according to 1. 

A'. Update the multipliers: 

+ p(x - xz*+^ - b‘+'-). 

5l Update the parameter /i* by = 7M*- 

UNTIL stopping criterion is met. 



is Z* with SVD where = diag (a*); moreover, F{Z) = f oa{Z), where cr(Z) is the vector of 

nonincreasing singular values of Z, then a* is obtained by using the Moreau-Yosida proximity operator 
a* = proXf^^{aA), where aA ■= diag(T,A), and 


proxf^ls{aA) 


argmin/((T) + 

a 


2 


a - aA 


2 

2- 


( 10 ) 


According to the first-order optimality eondition, the gradient of the objective function of ( [T^ with 
respect to each singular value should vanish. Thus we have 


2ai 
1 + 


+ l^kWi - a\A) = 0, s.t. ai > 0, for i 




( 11 ) 


The above equation is eubic and gives three roots. If = 0, the minimizer a* will be 0; otherwise, 
it ean be shown that there is a unique minimizer a* E [0,cr*^) if /) > 1. To ensure this requirement is 
satisfied, we adopt pf = 0.4 in our experiments. Finally, we obtain the update of J variable with 


= Udiag{al-^\...,a^+^)V^. 


( 12 ) 


Depending on different regularization strategies, we have different elosed-form solutions for E. For squared 
Forbenius norm. 


Y^ + p\X-XZ^+^) 


= 


/i* + 2 A 


(13) 


For /i-norm, aeeording to [|2^ . if we define Q = X — XZ^^^ + then E ean be updated element-wisely 


as: 


j^t+i _ f Qij f^tsgn{Qij), if \Qij\ < 

\ 0, otherwise. 


For / 2 , 1 -norm, by u2M . we have 

= 


Q:.i, if ||Q,,|l2 > 


0 , 


110:,ill 


Otherwise. 


(14) 


(15) 


The eomplete proeedure for solving problem Q is summarized in Algorithm 1. Sinee our objeetive 
function is nonconvex, it is difficult to give a rigorous mathematieal proof for eonvergenee to an (loeal) 
optimum. As we show in the supplementary material, our algorithm converges to an aeeumulation point 
and this aeeumulation point is a stationary point. Our experiments eonfirm the convergence of the proposed 
method. The experimental results are promising, despite that the solution obtained by the proposed 
optimization method may be a local optimum. 
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B. Subspace segmentation 

After we obtain the eoeffieient matrix Z*, we eonsider eonstrueting a similarity graph matrix W from 
it, sinee postproeessing of the eoeffieient matrix often imprcwes the elustering performanee liT3l . Using 
the angular information based teehnique in [|T4l . we define U = where U and S are from the 

skinny SVD of Z* = . Inspired by ll25ll . we define W as: 




Ut Ui 


\2<P 




\u 


J \\2 


(16) 


where Ui and Uj stand for the i-th and j-th eolumns of U, and 0 G M* tunes the sharpness of the affinity 
between two points. However, an exeessively large 0 would break affinities between points of the same 
group. 0 = 2 is used in our experiments, and thus we have the same post-proeessing proeedure as LRRQ 
After obtaining W, we direetly utilize NCuts to eluster the samples. 



Fig. 1. Sample face images in Extended Yale B. 


IV. Experiment 

In this section, we apply GEAR to subspace clustering on two benchmark databases: the Extended 
Yale B database (EYaleB) [[26l and the Hopkins 155 motion database [[27l . GEAR is compared with the 
state-of-the-art subspace clustering algorithms: SSG, ESA, ERR, and ERSG. The segmentation error rate is 
used to evaluate the subspace clustering performance, which is defined to be the percentage of erroneously 
clustered samples versus the total number of samples in the data set being considered. The parameters 
are tuned to achieve the best performance. In general, when the corruptions or noise are slight, the value 
of A should be relatively large. Eor our two experiments, A = 3 x 10“"^ and 67 are used. 7 influences 
the convergence speed, and we adopt 7 = 1.1 as often done in literature. Eor fair comparison, we follow 
experimental settings in [|T3]| . We stop the program when a maximum of 100 iterations or a relative 
difference of 10“^ is reached. The experiments are implemented on Intel Gore i5 2.3GHz MacBook Pro 
2011 with 4G memory. The code is available at: https://github.com/sckangz/logdet. 

A. Face clustering 

EYaleB consists of 2,414 frontal face images of 38 individuals under 64 lighting conditions. The task 
is to cluster these images into their individual subspaces by identity. EYaleB is challenging for subspace 
clustering due to large noise or corruptions, which can be seen from sample images in Eigure[^ As [|T3l, 
we model noise with ||i7||i. Each image is resized to a 2016-dimensional vector. We divide the 38 subjects 
into four groups, i.e., 1 to 10, 11 to 20, 21 to 30, and 31 to 38, and consider all choices of {2, 3, 5, 8} 
for each group and all choices of n = 10 in the first three groups. There will be {163,416,812,136,3} 
datasets for each n, respectively. 

Mean and median error rates for the datasets corresponding to each n are reported in Table |Ij It can 
be seen that GEAR outperforms the other methods significantly. As more subjects are involved, the error 
rate of GEAR remains at a low level, while those of other methods increase drastically. In particular, 
in the most challenging case of 10 subjects, the mean clustering error rate of GEAR is 3.85%, which 
improves by 7.09% compared to the best result provided by SSG. This implies that our method is robust 
to in-sample outliers. In Table we also observe that the clustering error rates of ESA are much larger 
than other methods. This is potentially because ESA is based on MSE, which is heavily influenced 


'As we confirmed with an author of m, the power 2 of equation (12) in m is a typo, which should be 4. 
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Fig. 2. Examples of recovery results of face images. The three columns from left to right are the original image (X), the error matrix (E) 
and the recovered image (XZ), respectively. 


by outliers. In addition, the advantage of our method is mueh more signifieant with respeet to other 
low-rank representation based algorithms sueh as LRR and LRSC; for example, there is 11% and 19% 
improvement over LRR in the eases of 8 and 10 subjeets, respeetively. This verifies the importanee of 
good rank approximation. 

Figure shows some reeovery results from the 10-subjeet elustering seenario. As we ean see, the error 
term E is indeed sparse and it helps remove the shadows. 

TABLE I 

Clustering error rate (%) on the EYaleB dataset. 


METHOD 

LRR 

ssc 

LSA 

LRSC 

CLAR 

2 Subjects 
Mean 

2.54 

1.86 

32.80 

5.32 

1.27 

Median 

0.78 

0.00 

47.66 

4.69 

0.78 

3 Subjects 

Mean 

4.21 

3.10 

52.29 

8.47 

1.92 

Median 

2.60 

1.04 

50.00 

7.81 

1.56 

5 Subjects 
Mean 

6.90 

4.31 

58.02 

12,24 

2.64 

Median 

5.63 

2.50 

56.87 

11.25 

2.19 

8 Subjects 

Mean 

14.34 

5.85 

59.19 

23.72 

3.36 

Median 

10.06 

4.49 

58.59 

28.03 

3.03 

10 Subjects 

Mean 

22.92 

10.94 

60.42 

30.36 

3.85 

Median 

23.59 

5.63 

57.50 

28.75 

3.44 



Fig. 3. Sample images in Flopkins 155 database. Trackers are denoted by different colors. 
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B. Motion segmentation 

In this subsection, we evaluate the robustness of CLAR for motion segmentation problem, which is 
an important step in video sequences analysis. Given multiple image frames of a dynamic scene, motion 
segmentation is to cluster the points in those views into different motions undertaken by the moving 
objects. Hopkins 155 motion database contains 155 video sequences along with features extracted and 
tracked in all frames for each sequence. Since the trajectories associated with each motion reside in a 
distinct affine subspace of dimension d < 3, every motion corresponds to a subspace. Figure gives some 
sample images. ||i?|||' is applied to model the noise. Tableshows the clustering results on the Hopkins 

TABLE II 

Segmentation error rate (%) and mean computational time (s) on the Hopkins 155 dataset. 


METHOD 

LRR 

SSC 

LSA 

LRSC 

CLAR 

2 Motions 






Mean 

2.13 

1.52 

4.23 

3.69 

1.32 

Median 

0.00 

0.00 

0.56 

0.29 

0.00 

3 Motions 






Mean 

4.03 

4.40 

7.02 

7.69 

2.60 

Median 

1.43 

0.56 

1.45 

3.80 

0.51 

All 






Mean 

2.56 

2.18 

4.86 

4.59 

1.61 

Median 

0.00 

0.00 

0.89 

0.60 

0.00 

Average Time 

6.44 

5.09 

17.17 

0.70 

3.80 



Fig. 4. The influence of the parameter A of CLAR on all 155 sequences of Hopkins 155. 


155 dataset. CLAR achieves the best results in all cases. Specifically, the average clustering error rate is 
1.32% for two motions and 2.60% for three motions. We also show the computational time in Table |nj 
As we can see, our computational time is less than LRR, SSC and LSA, though more than LRSC. Figure 
1^ demonstrates the sensitivity of our algorithm to A. It shows that the performance of CLAR is quite 
stable while A varies in a pretty large range. We also test 7 with values 1.05 and 1.2 which do not give 
much difference in error rate. Since our problem is nonconvex, we repeat the experiments using different 
random initializations and we can still get similar results after tuning the parameters. Thus, CLAR appears 
quite insensitive to initilizations. 


V. Conclusion 

In this paper, we study the matrix rank minimization problem with log-determinant approximation. 
This surrogate can better approximate the rank function. As an application, we study its use for the 
robust subspace clustering problem. A minimization algorithm, based on a type of augmented Lagrangian 
multipliers method, is developed to optimize the associated nonconvex objective function. Extensive 
experiments on the face clustering and motion segmentation demonstrate the effectiveness and robustness 
of the proposed method, which shows superior performance when compared to the state-of-the-art subspace 
clustering methods. 
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